This is a storyboard project using death-row data from the Texas Department of Criminal Justice, or TDCJ.
TDCJ keeps keeps records of every inmate they execute.
In this project, I scrape the data from the HTML table, which contains some demographic information on each inmate. I use the Selector Gadget to locate and then build urls to the info and last statement for each inmate.
With some help from the purrr package, I
was able to download all of the .jpg images from the
website. We perform some image manipulation with the magick
package.
HTML web-scraping packages
library(rvest)
library(jsonlite)
library(xopen)
library(xml2)Data wrangling/processing packages
library(dplyr)
library(tidyr)
library(janitor)
library(forcats)
library(lubridate)
library(stringr)
library(scales)
library(fs)Graphs
library(paletteer)
library(ggthemes)
library(patchwork)Shiny/dashboard packages
library(knitr)
library(flexdashboard)
library(gtsummary)Image manipulation/processing
library(grid)
library(magick)
library(ggpubr)
library(jpeg)
library(paletteer)
library(ggthemes)
library(patchwork)
library(ggrepel)Read HTML from URL
Load the xml2 package and define the url with the data
(here it’s webpage_url).
webpage_url <- "http://www.tdcj.state.tx.us/death_row/dr_executed_offenders.html"
webpage <- xml2::read_html(webpage_url)Exract HTML tables
Use the rvest::html_table() to find the table in the
webpage object. This is at position [[1]].
ExOffndrsRaw <- rvest::html_table(webpage)[[1]] The dplyr::glimpse(40) function is helpful here.
# check the data.frame
ExOffndrsRaw |> dplyr::glimpse(60)Rows: 576
Columns: 10
$ Execution <int> 576, 575, 574, 573, 572, 571, 570, 56…
$ Link <chr> "Inmate Information", "Inmate Informa…
$ Link <chr> "Last Statement", "Last Statement", "…
$ `Last Name` <chr> "Ramirez", "Chanthakoummane", "Buntio…
$ `First Name` <chr> "John", "Kosol", "Carl", "Rick", "Joh…
$ TDCJNumber <int> 999544, 999529, 993, 999049, 999567, …
$ Age <int> 38, 41, 78, 57, 45, 41, 45, 47, 64, 4…
$ Date <chr> "10/5/2022", "8/17/2022", "4/21/2022"…
$ Race <chr> "Hispanic", "Other", "White", "White"…
$ County <chr> "Nueces", "Collin", "Harris", "Harris…
Fix the column names
We can see the Link column is repeated, which is going
to be a problem when we put these data into their own
tibble because R doesn’t like to repeat the column names
inside a data.frame.
We will use the tibble::as_tibble() function, but add
the .name_repair = "unique" argument. The
.name_repair argument has other options
("check_unique", "unique",
"universal" and "minimal"), and you can read
the help files using ?as_tibble.
In this case, "unique" will work just fine.
ExecutedOffenders <- rvest::html_table(webpage)[[1]] |>
# repair the repeated columns
tibble::as_tibble(.name_repair = "unique") |>
# get unique names
janitor::clean_names(case = "snake") |>
# lower, snake case
dplyr::rename(offender_info = link_2,
# rename these
last_statement = link_3)ExecutedOffenders |> glimpse(60)Rows: 576
Columns: 10
$ execution <int> 576, 575, 574, 573, 572, 571, 570, …
$ offender_info <chr> "Inmate Information", "Inmate Infor…
$ last_statement <chr> "Last Statement", "Last Statement",…
$ last_name <chr> "Ramirez", "Chanthakoummane", "Bunt…
$ first_name <chr> "John", "Kosol", "Carl", "Rick", "J…
$ tdcj_number <int> 999544, 999529, 993, 999049, 999567…
$ age <int> 38, 41, 78, 57, 45, 41, 45, 47, 64,…
$ date <chr> "10/5/2022", "8/17/2022", "4/21/202…
$ race <chr> "Hispanic", "Other", "White", "Whit…
$ county <chr> "Nueces", "Collin", "Harris", "Harr…
The executed offenders table:
Identify the links with selector gadget
In order to get the nodes from the table, we need to
send webpage through a few passes of rvest
functions (html_nodes and html_attr) with
various css tags to get the correct URL paths. This took a
few tries and some trial and error, but eventually I was able to figure
out the the correct combinations to get the Links to the
pages.
Links <- webpage |>
# this get the links in the overflow table
# row
rvest::html_nodes(".overflow tr") |>
# the links
rvest::html_nodes("a") |>
# the header ref
rvest::html_attr("href")# check Links
Links |> utils::head(20) [1] "dr_info/ramirezjohn.html"
[2] "dr_info/ramirezjohnlast.html"
[3] "dr_info/chanthakoummanekosoul.html"
[4] "dr_info/chanthakoummanekosollast.html"
[5] "dr_info/buntionc.jpg"
[6] "dr_info/buntioncarllast.html"
[7] "dr_info/rhoadesrick.html"
[8] "dr_info/rhoadesricklast.html"
[9] "dr_info/hummeljohn.html"
[10] "dr_info/hummeljohnlast.html"
[11] "dr_info/jonesquintin.html"
[12] "dr_info/jonesquintinlast.html"
[13] "dr_info/wardlowbilly.html"
[14] "dr_info/wardlowbillylast.html"
[15] "dr_info/ochoaabel.html"
[16] "dr_info/ochoaabellast.html"
[17] "dr_info/gardnerjohn.html"
[18] "dr_info/gardnerjohnlast.html"
[19] "dr_info/runnelstravis.html"
[20] "dr_info/runnelstravislast.html"
Now Links contain:
A dr_info/ path (which makes the entire path
"http://www.tdcj.state.tx.us/death_row/dr_info/").
Every offender has two links–one with their full name, the other
with a last string attached to the back of their full
name.
Something tells me if I check the base::length() of
Links with the base::nrow()s in
ExOffndrs…there will be twice as many links as rows in
executed offenders.
length(Links)[1] 1152
nrow(ExecutedOffenders)[1] 576
Good–this is what I want. That means each row in
ExecutedOffenders has two links associated with their
name.
Create last_links statements
The stringr package can help me wrangle this long vector
into the last_pattern logical vector, which I then use to
subset the Links.
last_pattern <- stringr::str_detect(
string = Links,
pattern = "last")
utils::head(Links[last_pattern])[1] "dr_info/ramirezjohnlast.html"
[2] "dr_info/chanthakoummanekosollast.html"
[3] "dr_info/buntioncarllast.html"
[4] "dr_info/rhoadesricklast.html"
[5] "dr_info/hummeljohnlast.html"
[6] "dr_info/jonesquintinlast.html"
Check to see that Links[last_pattern] is same length as
the number of rows in ExecutedOffenders…
base::identical(
x = base::length(Links[last_pattern]),
y = base::nrow(ExecutedOffenders))[1] TRUE
Great–subset the Links for the
last_pattern, then give this vector a name
(last_links).
last_links <- Links[last_pattern]
last_links |> utils::head(10) [1] "dr_info/ramirezjohnlast.html"
[2] "dr_info/chanthakoummanekosollast.html"
[3] "dr_info/buntioncarllast.html"
[4] "dr_info/rhoadesricklast.html"
[5] "dr_info/hummeljohnlast.html"
[6] "dr_info/jonesquintinlast.html"
[7] "dr_info/wardlowbillylast.html"
[8] "dr_info/ochoaabellast.html"
[9] "dr_info/gardnerjohnlast.html"
[10] "dr_info/runnelstravislast.html"
If I check the length of items in last_links, I can see
there are an identical number of rows in the data frame.
base::identical(
x = base::length(last_links),
y = base::nrow(ExecutedOffenders))[1] TRUE
Assign the last_url column to
ExecutedOffenders
This means I can easily assign these as a new column in
ExecutedOffenders.
ExecutedOffenders |> glimpse(40)Rows: 576
Columns: 10
$ execution <int> 576, 575, 574, …
$ offender_info <chr> "Inmate Informa…
$ last_statement <chr> "Last Statement…
$ last_name <chr> "Ramirez", "Cha…
$ first_name <chr> "John", "Kosol"…
$ tdcj_number <int> 999544, 999529,…
$ age <int> 38, 41, 78, 57,…
$ date <chr> "10/5/2022", "8…
$ race <chr> "Hispanic", "Ot…
$ county <chr> "Nueces", "Coll…
Not done yet–I need to add the beginning of the web address:
https://www.tdcj.texas.gov/death_row/
# test
ExecutedOffenders |>
dplyr::mutate(
last_url =
paste0("https://www.tdcj.texas.gov/death_row/",
last_links)) |>
dplyr::pull(last_url) |>
utils::head(10) |>
base::writeLines()https://www.tdcj.texas.gov/death_row/dr_info/ramirezjohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/chanthakoummanekosollast.html
https://www.tdcj.texas.gov/death_row/dr_info/buntioncarllast.html
https://www.tdcj.texas.gov/death_row/dr_info/rhoadesricklast.html
https://www.tdcj.texas.gov/death_row/dr_info/hummeljohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/jonesquintinlast.html
https://www.tdcj.texas.gov/death_row/dr_info/wardlowbillylast.html
https://www.tdcj.texas.gov/death_row/dr_info/ochoaabellast.html
https://www.tdcj.texas.gov/death_row/dr_info/gardnerjohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/runnelstravislast.html
# assign
ExecutedOffenders <- ExecutedOffenders |>
dplyr::mutate(
last_url =
paste0("https://www.tdcj.texas.gov/death_row/",
last_links))Now we will tidy these up into nice, clean LastUrl
tibble.
LastUrl <- last_links |>
tibble::as_tibble(.name_repair =
"unique") |>
tidyr::gather(key = "key",
value = "value") |>
dplyr::select(name_last_url = value) |>
dplyr::mutate(name_last_url =
paste0("https://www.tdcj.texas.gov/death_row/",
last_links))
LastUrl$name_last_url |>
utils::head() |>
base::writeLines()https://www.tdcj.texas.gov/death_row/dr_info/ramirezjohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/chanthakoummanekosollast.html
https://www.tdcj.texas.gov/death_row/dr_info/buntioncarllast.html
https://www.tdcj.texas.gov/death_row/dr_info/rhoadesricklast.html
https://www.tdcj.texas.gov/death_row/dr_info/hummeljohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/jonesquintinlast.html
Test one of the URLs out in the browser.
xopen("https://www.tdcj.texas.gov/death_row/dr_info/swearingenlarrylast.html")Create the info pattern
Now I want the offender information links (so I omit the links with
last in the pattern).
info_pattern <- !stringr::str_detect(
string = Links,
pattern = "last")
Links[info_pattern] |>
utils::head() |>
base::writeLines()dr_info/ramirezjohn.html
dr_info/chanthakoummanekosoul.html
dr_info/buntionc.jpg
dr_info/rhoadesrick.html
dr_info/hummeljohn.html
dr_info/jonesquintin.html
Verify length and rows
Check the base::length() to see if it’s identical to the
number of rows in ExecutedOffenders.
base::identical(x = base::length(Links[info_pattern]),
y = base::nrow(ExecutedOffenders))[1] TRUE
Great!
Check the length() of info_links
info_links <- Links[info_pattern]
base::identical(x = base::length(info_links),
y = base::nrow(ExecutedOffenders))[1] TRUE
These are also identical. Repeat the URL process from above on the
info_url
Create info_url column
Now we combine this with the
https://www.tdcj.texas.gov/death_row/ URL.
ExecutedOffenders |>
dplyr::mutate(
info_url =
paste0("https://www.tdcj.texas.gov/death_row/",
info_links)) |>
dplyr::pull(last_url) |>
utils::head(10) |>
base::writeLines()https://www.tdcj.texas.gov/death_row/dr_info/ramirezjohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/chanthakoummanekosollast.html
https://www.tdcj.texas.gov/death_row/dr_info/buntioncarllast.html
https://www.tdcj.texas.gov/death_row/dr_info/rhoadesricklast.html
https://www.tdcj.texas.gov/death_row/dr_info/hummeljohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/jonesquintinlast.html
https://www.tdcj.texas.gov/death_row/dr_info/wardlowbillylast.html
https://www.tdcj.texas.gov/death_row/dr_info/ochoaabellast.html
https://www.tdcj.texas.gov/death_row/dr_info/gardnerjohnlast.html
https://www.tdcj.texas.gov/death_row/dr_info/runnelstravislast.html
# assign
ExecutedOffenders <- ExecutedOffenders |>
dplyr::mutate(
info_url =
paste0("http://www.tdcj.state.tx.us/death_row/",
info_links))These are complete URLs–assign this to ExecutedOffenders
data frame. Put the InfoLinks into a tidy data frame.
info_links <- Links[info_pattern]
InfoLinks <- info_links |>
# turn into a tibble
tibble::as_tibble(.name_repair = "unique") |>
# tidy
tidyr::gather(key = "key",
value = "value") |>
# rename the value
dplyr::select(dr_info_url = value) |>
# create the new url with death row root
dplyr::mutate(
dr_info_url = paste0(
"http://www.tdcj.state.tx.us/death_row/",
info_links))InfoLinks |> dplyr::glimpse(60)Rows: 576
Columns: 1
$ dr_info_url <chr> "http://www.tdcj.state.tx.us/death_row…
Check in browser
Test a few of these out in the browser:
xopen("http://www.tdcj.state.tx.us/death_row/dr_info/brookscharlie.html")Now we assign these links to the ExecutedOffenders data
frame. But first make sure they match up.
ExecutedOffenders |>
dplyr::select(last_name,
first_name) |>
utils::head(10)# A tibble: 10 × 2
last_name first_name
<chr> <chr>
1 Ramirez John
2 Chanthakoummane Kosol
3 Buntion Carl
4 Rhoades Rick
5 Hummel John
6 Jones Quintin
7 Wardlow Billy
8 Ochoa Abel
9 Gardner John
10 Runnels Travis
ExecutedOffenders |>
dplyr::select(last_name,
first_name) |>
utils::tail(10)# A tibble: 10 × 2
last_name first_name
<chr> <chr>
1 Rumbaugh Charles
2 Porter Henry
3 Milton Charles
4 De La Rosa Jesse
5 Morin Stephen
6 Skillern Doyle
7 Barefoot Thomas
8 O'Bryan Ronald
9 Autry James
10 Brooks, Jr. Charlie
Bind columns
Combine the ExecutedOffenders, LastUrl and
InfoLinks.
# Use `dplyr::bind_cols()` to attach these columns
# to `ExecutedOffenders` and rename to`ExOffndrsComplete`
ExecutedOffenders <- ExecutedOffenders |>
# add the info_url
dplyr::bind_cols(LastUrl) |>
# add the
dplyr::bind_cols(InfoLinks) |>
# move the names to the front
dplyr::select(dplyr::ends_with("name"),
# all else
dplyr::everything())ExecutedOffenders |> dplyr::glimpse(60)Rows: 576
Columns: 14
$ last_name <chr> "Ramirez", "Chanthakoummane", "Bunt…
$ first_name <chr> "John", "Kosol", "Carl", "Rick", "J…
$ execution <int> 576, 575, 574, 573, 572, 571, 570, …
$ offender_info <chr> "Inmate Information", "Inmate Infor…
$ last_statement <chr> "Last Statement", "Last Statement",…
$ tdcj_number <int> 999544, 999529, 993, 999049, 999567…
$ age <int> 38, 41, 78, 57, 45, 41, 45, 47, 64,…
$ date <chr> "10/5/2022", "8/17/2022", "4/21/202…
$ race <chr> "Hispanic", "Other", "White", "Whit…
$ county <chr> "Nueces", "Collin", "Harris", "Harr…
$ last_url <chr> "https://www.tdcj.texas.gov/death_r…
$ info_url <chr> "http://www.tdcj.state.tx.us/death_…
$ name_last_url <chr> "https://www.tdcj.texas.gov/death_r…
$ dr_info_url <chr> "http://www.tdcj.state.tx.us/death_…
Create indicator for .html vs .jpgs
Create a binary variable to identify if this is a .jpg
or .html path and name the new data frame
ExOffndrsComplete.
Use case_when() to create
jpg_html
ExOffndrsComplete <- ExecutedOffenders |>
dplyr::mutate(jpg_html =
dplyr::case_when(
str_detect(string = info_url, pattern = ".jpg") ~ "jpg",
str_detect(string = info_url, pattern = ".html") ~ "html")) ExOffndrsComplete |> dplyr::count(jpg_html)# A tibble: 2 × 2
jpg_html n
<chr> <int>
1 html 195
2 jpg 381
Download the selector gadget app for your browser. You can identify the various elements in a webpage using the selector gadget. Read this tutorial to see how it works.
Use the selector gadget to locate the css tags for
overflow, a
Now that we’ve downloaded the table of executed offenders from the Texas
Department of Criminal Justice website and rebuilt the URLs to their
last statements and info, we are going to download the mugshots that are
stored as .jpgs.
ExOffndrsComplete |> dplyr::glimpse(60)Rows: 576
Columns: 15
$ last_name <chr> "Ramirez", "Chanthakoummane", "Bunt…
$ first_name <chr> "John", "Kosol", "Carl", "Rick", "J…
$ execution <int> 576, 575, 574, 573, 572, 571, 570, …
$ offender_info <chr> "Inmate Information", "Inmate Infor…
$ last_statement <chr> "Last Statement", "Last Statement",…
$ tdcj_number <int> 999544, 999529, 993, 999049, 999567…
$ age <int> 38, 41, 78, 57, 45, 41, 45, 47, 64,…
$ date <chr> "10/5/2022", "8/17/2022", "4/21/202…
$ race <chr> "Hispanic", "Other", "White", "Whit…
$ county <chr> "Nueces", "Collin", "Harris", "Harr…
$ last_url <chr> "https://www.tdcj.texas.gov/death_r…
$ info_url <chr> "http://www.tdcj.state.tx.us/death_…
$ name_last_url <chr> "https://www.tdcj.texas.gov/death_r…
$ dr_info_url <chr> "http://www.tdcj.state.tx.us/death_…
$ jpg_html <chr> "html", "html", "jpg", "html", "htm…
purrr and iteration
We will use purrrs iteration tools to download the
images attached to the website profiles. Follow the three
purrr steps from the workshop by Charlotte
Wickham. See the accompanying slides,
too.
We’ll go over these steps below:
purrr 1) Do ‘it’ for one
element
First we need a test image (test_image) from the
jpg_html column:
test_image <- ExOffndrsComplete |>
# only jpg row
dplyr::filter(jpg_html == "jpg") |>
# pull the info url column
dplyr::select(info_url) |>
# sample 1
dplyr::sample_n(size = 1) |>
# convert to character
base::as.character() We can test the new url columns in the ExecOffenders
with the magick::image_read() function.
# pass test_image to image_read()
magick::image_read(test_image)You should see an image in the RStudio viewer pane
2. Turn ‘it’ into a recipe
Use dplyr::filter to select only the .jpgs
in the ExOffndrsComplete and create
ExOffndrsCompleteJpgs.
Pull the urls into a vector (jpg_url), then create a
folder to download them into (jpg_path).
ExOffndrsCompleteJpgs <- ExOffndrsComplete |>
dplyr::filter(jpg_html == "jpg")
jpg_url <- ExOffndrsCompleteJpgs$info_url
fs::dir_create("inst/extdata/jpgs/")
jpg_path <- paste0("inst/extdata/jpgs/", base::basename(jpg_url))
jpg_path |> utils::head()3. Use purrr::purrr::walk2() to download all
files
Now use the purrr::walk2() function to download the
files. How does walk2 work?
First look at the arguments for
utils::download.file().
?utils::download.fileHow to walk2()
The help files tell us the walk2 function is
“specialized for the two argument case”. So .x and
.y become the two arguments we need to iterate over
download.file(). We will walk through this step-by-step
below:
.x = the file path, which we created with the
selector gadget above (in jpg_url)
.y = the location we want the files to end up
(jpg_path), and
the function we want to iterate over .x and
.y (download.file).
When we pass everything to purrr::walk2, R will go to
the URL, download the file located at the URL, and put it in the
associated jpgs/ folder.
Download .jpg files
Execute the code below and you will see the .jpgs downloading into
the jpg folder.
purrr::walk2(.x = jpg_url,
.y = jpg_path,
.f = download.file)You should see the following in your console.
You will see text similar to the content below.
# trying URL 'http://www.tdcj.state.tx.us/death_row/dr_info/robisonlarry.jpg'
# Content type 'image/jpeg' length 108341 bytes (105 KB)
# ==================================================
# downloaded 105 KB
#
# trying URL 'http://www.tdcj.state.tx.us/death_row/dr_info/hicksdavid.jpg'
# Content type 'image/jpeg' length 139150 bytes (135 KB)
# ==================================================
# downloaded 135 KBThis might take awhile, but when its done, check the number of files in this folder.
fs::dir_info("./inst/extdata/jpgs/") |>
tibble::as_tibble() |>
dplyr::arrange(desc(size)) |>
dplyr::select(path, type, size) |>
dplyr::glimpse(60)
# Rows: 381
# Columns: 3
# $ path <fs::path> "./inst/extdata/jpgs/riddlegranville.jpg…
# $ type <fct> file, file, file, file, file, file, file, fil…
# $ size <fs::bytes> 244K, 225K, 202K, 195K, 190K, 189K, 189…There you have it! 381 images of downloaded offenders!
magick::image_read()You should see the .jpg file in the RStudio Viewer pane
after running magick::image_read()
purrr::walk2()When downloading the .jpg files with
purrr::walk2(), you should see this in your console:
age and raceExecutions by age and
race
| Characteristic | N = 5761 |
|---|---|
| age | 39 (33, 45) |
| race | |
| Black | 206 (36%) |
| Hispanic | 110 (19%) |
| Other | 3 (0.5%) |
| White | 257 (45%) |
| 1 Median (IQR); n (%) | |
date and monthExecutions by date and
month
| Characteristic | N = 5761 |
|---|---|
| date | 1982-12-07 to 2022-10-05 |
| month | |
| Jan | 61 (11%) |
| Feb | 45 (7.8%) |
| Mar | 49 (8.5%) |
| Apr | 43 (7.5%) |
| May | 59 (10%) |
| Jun | 58 (10%) |
| Jul | 34 (5.9%) |
| Aug | 52 (9.0%) |
| Sep | 52 (9.0%) |
| Oct | 39 (6.8%) |
| Nov | 50 (8.7%) |
| Dec | 34 (5.9%) |
| 1 Range; n (%) | |
Executions by race and
year
I used the magik package for processing and manipulating the mugshot images. I advise checking out the entire vignette for more examples.
Create test image
The executed offender we’ll include in the plot is Billy Joe Woods from Waller County, Texas. He was executed on April 14th, 1997. The LA Times covered this case in an article titled, Executions in Texas: No Big Deal.
“At this rate, with 12 executions already, 1997 is expected to become the busiest year ever on Texas’ death row–or any other state’s, for that matter–surpassing the previous record of 19, set here in 1995. If and when that happens–most likely next month–it could very well reignite the debate over capital punishment across the nation, inspiring activists on both sides of the issue alternately to hail and condemn the milestone.”
I create test_magick_img from
magick::image_read(), and then go on making the
transformations as necessary.
"https://www.tdcj.texas.gov/death_row/dr_info/woodsbillyjoe.jpg" -> test_imagetest_magick_img <- magick::image_read(test_image)To remove the text and focus on the mugshot. This might need to be
adjusted slightly for each new test_magick_img.
# crop this image
crop_750_1000_10 <- magick::image_crop(
image = test_magick_img,
geometry = "750x1000+10"
)This trims the extra space off the bottom of the image.
Rotate with magick::image_rotate()
I want to rotate this image by 90 degrees.
# rotate this image
crop_750_1000_10_rotate90 <- magick::image_rotate(
crop_750_1000_10,
degrees = 90
)Remove the rest of the text and focus on the mugshot. This might need to be adjusted slightly for each new image
# crop this image
crop_850_950_60 <- magick::image_crop(
image = crop_750_1000_10_rotate90,
geometry = "850x950+60"
)Rotate this image back to center (image_rotate again)
and flip it using magick::image_flip()
# rotate this image
crop_850_950_60_rotate270 <- magick::image_rotate(
crop_850_950_60,
degrees = 270)
# rotate this image
crop_850_950_60_flip <- magick::image_flip(crop_850_950_60_rotate270)Crop the rest of the text out of the image, trim the white-space for the plot, and flip the image.
# crop this image
crop_750_255_300 <- magick::image_crop(
image = crop_850_950_60_flip,
geometry = "750x255+300"
)
# flip this image again
crop_850_950_60_flip <- magick::image_flip(crop_750_255_300)Crop out the last little bit of the document.
# crop the dot out
crop_435_352_1 <- magick::image_crop(
image = crop_850_950_60_flip,
geometry = "435x352+1"
)Image effects with magick
We will use magick::image_trim() to clean the image up a
bit.
# Here we will trim the image up a bit with the `fuzz` argument
test_magick_clean <- magick::image_trim(
image = crop_435_352_1,
fuzz = 1
)Now that I have all the trimming on and cropping done, I will add
some effects for the ggplot2 image.
I want the image to have a bit more contrast, so I will use
magick::image_contrast() to create these effects.
test_magick_final <- magick::image_contrast(image = test_magick_clean,
sharpen = TRUE
)Export .jpg
Export the image as a .jpg
magick::image_write(test_magick_final,
path = "www/test_magick_final.jpg")If we print test_magick_img to the console, the images
comes up in the viewer pane.
Original image
Final image
I create TotalExecYear as the data for the image to
appear on top of.
TotalExecYear <- ExOffndrsComplete |>
dplyr::count(year, sort = TRUE)
head(TotalExecYear)# A tibble: 6 × 2
year n
<dbl> <int>
1 2000 40
2 1997 37
3 1999 35
4 2002 33
5 2007 26
6 2003 24
Import .jpg
library(jpeg)
imgJPEG <- jpeg::readJPEG("www/test_magick_final.jpg")Build labels
labs_bjw <- labs(
title = "Billy Joe Woods was the twelfth execution in Texas in 1997",
subtitle = "1997 is expected to become the busiest year ever on Texas’ death row",
caption = "https://www.latimes.com/archives/la-xpm-1997-05-20-mn-60637-story.html",
x = "Year",
y = "Total Executions"
)Build a base plot
ggp2_bjw_base <- ggplot(
data = TotalExecYear,
aes(x = factor(year),
y = n)) Create background_image() with ggpubr
ggp2_bjw_bg_img <- ggpubr::background_image(imgJPEG) Create column graph
ggp2_bjw_col <- geom_col(aes(fill = n),
position = "stack",
show.legend = FALSE) Build x scale
ggp2_bjw_x_discrete <- scale_x_discrete(
breaks = c("1980","1985", "1990", "1995",
"2000", "2005", "2010", "2015", "2020"))Set colors with paletteer
ggp2_bjw_fill_pal <- paletteer::scale_fill_paletteer_c("ggthemes::Red") Add labels using ggrepel
We also want to label the specific point on the graph where Billy Joe
Woods was executed. We can do this by filtering the
ExecOffenders to only his row in that dataset, then
creating a label variable with paste0(). We
now want to limit the columns in ExecOffLabel to only
those values on the graph, which are year,
n, and label
ExecOffLabel <- ExOffndrsComplete %>%
filter(last_name == "Woods" & first_name == "Billy") %>%
mutate(
year = factor(year),
label = paste0(last_name, ", ", first_name, ": executed on ", date),
n = 12) %>%
select(year, n, label)
ExecOffLabel# A tibble: 1 × 3
year n label
<fct> <dbl> <chr>
1 1997 12 Woods, Billy: executed on 1997-04-14
We will add the geom_label_repel() layer with
ExecOffLabelData and increase the size of the
label so it stands out.
ggp2_bjw_lab_repel <- ggrepel::geom_label_repel(
data = ExecOffLabel,
aes(x = year,
y = n,
label = label),
# set color and size...
color = "black",
size = 2) Build plot
ggp2_bjw_base +
ggp2_bjw_bg_img +
ggp2_bjw_col +
ggp2_bjw_x_discrete +
ggp2_bjw_fill_pal +
labs_bjw +
theme_minimal()Add labels
ggp2_bjw_base +
ggp2_bjw_bg_img +
ggp2_bjw_col +
ggp2_bjw_lab_repel +
ggp2_bjw_x_discrete +
ggp2_bjw_fill_pal +
labs_bjw +
ggpubr::theme_pubclean()